A Forensic Text Comparison in SMS Messages: A Likelihood Ratio Approach with Lexical Features

نویسنده

  • Shunichi Ishihara
چکیده

Due to its convenience and low cost, short message service (SMS) has been a very popular medium of communication for quite some time. Unfortunately, however, SMS messages are sometimes used for reprehensible purposes, e.g. communication between drug dealers and buyers, or in illicit acts such as extortion, fraud, scams, hoaxes, and false reports of terrorist threats. In this study, we perform a likelihood-ratio-based forensic text comparison of SMS messages focusing on lexical features. The likelihood ratios (LRs) are calculated in Aitken and Lucy’s (2004) multivariate kernel density procedure, and are calibrated. The validity of the system is assessed based on the magnitude of the LRs using the log-likelihood-ratio cost (Cllr). The strength of the derived LRs is graphically presented in Tippett plots. The results of the current study are compared with those of previous studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Evaluation of SMS Messages as Forensic Evidence: Likelihood Ratio Based Approach with Lexical Features

This study is one of the first likelihood ratio-based forensic text comparison studies in forensic authorship analysis. The likelihood-ratio-based evaluation of scientific evidence has started being adopted in many disciplines of forensic evidence comparison sciences, such as DNA, handwriting, fingerprints, footwear, voice recording, etc., and it is largely accepted that this is the way to ensu...

متن کامل

A Forensic Authorship Classification in SMS Messages: A Likelihood Ratio Based Approach Using N-gram

Due to its convenience and low–cost, short message service (SMS) has been a very popular medium for communication for quite some time. Unfortunately, however, SMS messages are sometimes used in illicit acts, such as communication between drug dealers and buyers, extortion, fraud, scam, hoax, false reports of terrorist threats, and many more. This study is a forensic study on the authorship clas...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

An Effect of Background Population Sample Size on the Performance of a Likelihood Ratio-based Forensic Text Comparison System: A Monte Carlo Simulation with Gaussian Mixture Model

This is a Monte Carlo simulation-based study that explores the effect of the sample size of the background database on a likelihood ratio (LR)-based forensic text comparison (FTC) system built on multivariate authorship attribution features. The text messages written by 240 authors who were randomly selected from an archive of chatlog messages were used in this study. The strength of evidence (...

متن کامل

Recovering dropped pronouns from Chinese text messages

Pronouns are frequently dropped in Chinese sentences, especially in informal data such as text messages. In this work we propose a solution to recover dropped pronouns in SMS data. We manually annotate dropped pronouns in 684 SMS files and apply machine learning algorithms to recover them, leveraging lexical, contextual and syntactic information as features. We believe this is the first work on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012